Talk:List of sequence alignment software
This article is rated List-class on Wikipedia's content assessment scale. It is of interest to the following WikiProjects: | ||||||||||||||||||||||||||||||||||||||||
|
Missing read mapper
[edit]HISAT2 is missing in the list of short read mappers. [1]. This is one of the best mappers for RNA-seq around. I do not have time to add it just now. Nicolas Le Novère (talk) 13:25, 25 October 2017 (UTC)
Conflict of interest edits
[edit]Thorwald (talk · contribs) asks in an edit summary why I removed the Geneious entry. The entry was made by Bm richard (talk · contribs) who is a Geneious developer. This type of edit is a conflict of interest and inappropriate on Wikipedia. The same editor's only contributions were to place Geneious spam in other articles. I hope editors here will not tolerate that type of behavior and remove that entry as well as any others that were added for self-interest reasons. Other list articles find that link spam is easly controlled by allowing only entries that also have articles. That ensures that the entry is sufficiently notable for inclusion, and any notability decisions happen in one place: the article itself. This article has a ways to go to in order to work with a system like that, but I urge the editors here to consider it to reduce abuse by Wikipedia by spammers. JonHarder 19:31, 27 October 2006 (UTC)
- Making the set of list entries and the set of articles coextensive has a problem here - a lot of these programs are useful enough to list but there just isn't enough to say about them to merit independent articles. Maybe they should have more complete descriptions here in lieu of their own pages, except for the very notable ones. I agree with the deletion of the article on Geneious but don't have a particular problem with its inclusion in the list. Opabinia regalis 02:49, 28 October 2006 (UTC)
- I will admit that I know nothing about Geneious. However a quick look at their site reveals that it appears to do "sequence alignment". As this article is a list of sequence alignment software, I believe it should be exhaustive and not just "notable"; this is an encyclopaedia after all. I am also not sure that the author of a programme submitting a link constitutes spam or conflict of interest. If they were to write a glowing article praising its virtues . . . I might think otherwise.--Thorwald 20:00, 28 October 2006 (UTC)
Which software is useful and better?
[edit]First, I think the sequence and structure alignment are two very different problems. Therefore, they should be described in two separate articles. Second, as a user of the structure alignment software, I had a lot of trouble trying to determine what is actually working there and reliable. There are some benchmarking papers, but they do not say: "use this program for that purpose", so I had to spend a lot of time running different tests myself. Finally, I realized that SSM (from EBI) server is convenient and robust enought to compare remotely related proteins, although it can not be applied for small proteins and peptides almost lacking secondary structure. Of course, I did not check all methods indicated in this article. So, would it be appropriate if someone anonymous run a couple of tests for different servers and made a table with their performance? Or maybe such table has been published already? Such benchmarking table would be very helpful for potential users, like me. I could provide some examples for testing. Biophys 22:46, 31 October 2006 (UTC)
- Your first point is not necessarily true. High sequence similarity generally means structural similarity. Indeed, it has been found over and over again that all the information necessary for the secondary and tertiary structure is found in the sequence. Two identical sequences should yield identical structures. Therefore, these are not different problems. They are married to each other. As such, I believe they should remain under one article.
- As for your second point: This is how computational biology works. Any metric used to determine the accuracy of a programme will generally introduce bias. In science, we don't have absolutes. Therefore, there is no way for anyone to tell you to "use this program for that purpose" . . . it just doesn't work that way.
- On your final point: Yes. SSM is a good server. However, just like every other algorithm/server listed in this article, it has its drawbacks. For an example, I prefer to run everything from the CLI (in Linux) and try to avoid servers. Using SSM can be tedious without a well-documented API.--Thorwald 01:20, 1 November 2006 (UTC)
- Yes, protein sequence and structure are certainly related. But I still believe that protein sequence and protein structure are two completely different things, and therefore it is more convenient for the readers of Wikipedia to have them separately. What other people think about it? Biophys 18:10, 1 November 2006 (UTC)
- I know for sure, based on the results of my testing, that some of the methods in this table (I tried only few of them) work worse than SSM for remotely related proteins. I think there is nothing bad about it. For example after CASP experiments, we know that some protein structure modeling methods work better than others, which is fine. After attending CASP meetings, I generally had a much better idea "which software should be used for that purpose". Of course, there are important benchmarking tests, although they are not absolutes. But I am not going to insist. That was only a suggestion. What is CLI? I could not find it in the Table. Why it is better than SSM? What is the trouble with SSM? Please tell me. Maybe I wrongly trust this server? Biophys 18:10, 1 November 2006 (UTC)
- You wrote that some of these "work worse" than SSM; how are you defining "worse" here? The CASP/CAPRI benchmarks are a good set of 50-80 structures for testing ab initio predictions, but how do define a "good prediction"? CLI just means "command line interface"; id est running programmes from the CLI instead of via a web server.--Thorwald 01:35, 2 November 2006 (UTC)
- I know for sure, based on the results of my testing, that some of the methods in this table (I tried only few of them) work worse than SSM for remotely related proteins. I think there is nothing bad about it. For example after CASP experiments, we know that some protein structure modeling methods work better than others, which is fine. After attending CASP meetings, I generally had a much better idea "which software should be used for that purpose". Of course, there are important benchmarking tests, although they are not absolutes. But I am not going to insist. That was only a suggestion. What is CLI? I could not find it in the Table. Why it is better than SSM? What is the trouble with SSM? Please tell me. Maybe I wrongly trust this server? Biophys 18:10, 1 November 2006 (UTC)
- This is very simple. One program finds superposition when the evolitionary conserved residues in two structures (say Asp and Asp) are structurally superimposed, but another program does not. This is very easy to see when you are working with specific proteins. Such conserved residues can usually be identified from multiple sequence alignments. Of course, I am well aware of such problems as existence of multiple alternative structural superpositions, or that the number of superimosed residues depends on the selected distance cutoff, etc. One of good papers on this subject was published in JMB by Chothia and others (that was about superpositions of 4-alpha-helical bindles). Actually, I liked superpositions in the old version of FSSP/DALI and was looking for a convenient server that does "difficult" superpositions at least as well as FSSP/DALI. That was SSM. Maybe there are better programs, but I personally have no time to check. Same thing with other typical users. That is why it is important to have some kind of independent testing. What can I say? "I have tried programs A and B but did not try C and D, and I liked program B better". That is very subjective. Biophys 15:24, 2 November 2006 (UTC)
Other pages like this?
[edit]I see the two categories for this page, namely 'bioinformatics' and 'lists of software', but how do I find pages that are also in this combination of categories? (And are there any?). Can we add a link to related pages?
I.e. 'List of sequence annotation software'.
--Dan|(talk) 07:42, 27 June 2007 (UTC)
- For example I found Software tools for molecular microscopy, which I think should be somehow categorized with this page. --Dan|(talk) 14:10, 27 June 2007 (UTC==)
Adding new software
[edit]I am interested in writing an article about a sequence alignment program developed by my university, VU Amsterdam, named PRALINE. I am however concerned with the conflict of interest rule. The original PRALINE paper can be found here. According to the book Essential Bioinformatics, ...PRALINE is perhaps the most sophisticated and accurate alignment program available. Because of the high complexity of the algorithm, its obvious drawback is the extremely slow computation. I have no plans to include comparisons against other popular SA programs, mostly because there is no downloadable version yet, and therefore it's difficult to be benchmarked against them. Please let me know what you think. I hope I receive a response within a reasonable amount of time. PervyPirate (talk) 19:20, 27 February 2008 (UTC)
I found out that the program ApE (A Plasmid Editor) can do alignment as well. I like this program because it's simple to use but still offers a lot of functionality (I'm a Bachelor student from Austria). 85.127.93.199 (talk) 18:49, 7 August 2009 (UTC)
Should it not be the license tab?
[edit]I find surprising that there is no licensing information in the software listing. Audriusa (talk) 10:10, 6 September 2009 (UTC)
Read mappers?
[edit]ne info? I'm looking for a Global:Local (global in query) 'read mapper' that takes base quality information into account. So far I haven't found anything... --Dan|(talk) 11:04, 9 November 2009 (UTC)
Indication of availability with Linux distributions?
[edit]I am biased since developing for Debian, but if there is some way to distribution-neutral (Linux flavours plus Mac plus Windows?) indicate the possibility to install directly a particular package listed here, then this may be of interest to the reader and help comparing the tools in an easier way. Such an availability is not unimportant when tools are ten years old and just do not compile any more with modern compilers and/or request libraries no longer available. The distros help by community-maintaining such software - not always, not enough, but this again is why I think pages like this and Linux distros should collaborate more. Smoe (talk) 18:24, 4 September 2011 (UTC)
SPAM
[edit]The page is full of SPAM. No entries should be there that are not verifiable and include reliable sources to assert notability. -- Alexf(talk) 13:48, 2 March 2017 (UTC)
I am associated with an example of sequence alignment software that I think should be added to the list but it has been suggested to me that this may be a COI so I request an independent opinion.
[edit]The user below has a request that an edit be made to List of sequence alignment software. That user has an actual or apparent conflict of interest. The requested edits backlog is very low. There are currently 32 requests waiting for review. Please read the instructions for the parameters used by this template for accepting and declining them, and review the request below and make the edit if it is well sourced, neutral, and follows other Wikipedia guidelines and policies. |
- What I think should be changed:
Lalallison (talk) 23:46, 4 November 2024 (UTC)
References
The algorithm is called 'Malign' which is short for 'modelling alignment'. The paper discussing it is at https://doi.org/10.1007/978-3-540-30549-1_19 and the software itself is available at https://github.com/drpowell/Alignment_Prob
I request that an entry for it be added to the table 'Pairwise alignment' with links to the URLs above.
- Why it should be changed:
As background:-
The instance of the algorithm is for pairwise alignment of DNA sequences, both global- and local-, and optimal- and sum-over-all- alignments. (The idea can be generalised to other alphabets.) Its novelty is that it incorporates a model (pretty much any model) of a population of sequences that may in themselves be compressible (e.g., as in plasmodium falciparum).
This addition (i) changes the rank-order of alignments and (ii) provides an information-theoretic criterion as to whether any discovered relationship between the two sequences is statistically significant. This is a better-performing alternative to so-called 'shuffling' and realignment method.
That being so, I suggest that this also justifies an addition to the 'Sequence alignment' page on this criterion under its 'Assessment of significance' section, but that is a separate request.
On a wider matter:-
I see there has been some 'talk' about whether a piece of software in the list should have links to (i) where the software can be got and (ii) to a scientific paper about it. As a general matter I think that would be a very good principle.
- References supporting the possible change (format using the "cite" button):
- List-Class Molecular Biology articles
- Unknown-importance Molecular Biology articles
- List-Class MCB articles
- Mid-importance MCB articles
- WikiProject Molecular and Cellular Biology articles
- List-Class Computational Biology articles
- High-importance Computational Biology articles
- WikiProject Computational Biology articles
- All WikiProject Molecular Biology pages
- List-Class List articles
- Low-importance List articles
- WikiProject Lists articles
- List-Class Computing articles
- Low-importance Computing articles
- List-Class software articles
- Low-importance software articles
- List-Class software articles of Low-importance
- All Software articles
- All Computing articles
- Wikipedia conflict of interest edit requests